Slightly tweaking the traditional counterfactual, DiD asks: “what would happen to the trend for this unit had it never received the treatment?”
How would your rate of growth change if you ate more vegetables?
What would happen to inflation if the federal reserve lowered interest rates?
How would the Arab Spring have unfolded if participants lacked access to cell phones and social media?
Probably the oldest non-experimental method of causal inference (likely dates back to 1855)
Units must be observed before and after the “treatment”, so most commonly applied to panel data.
If assumptions are met, can control for both observed and unobserved confounding.
1854 Broad Street Cholera outbreak killed over 600 people in a poor district of London. What caused the outbreak?
What causes Cholera in general?
What interventions work?
The immediate and chief cause of diseases is atmospheric impurity arising from decomposing remnants of the substances used for food and from the impurities given out from their own bodies. (Neil Arnott, 1844)
Snow, however, found the initial outbreak was clustered around a single water pump on Broad Street. (73 of 83 initial deaths nearer to the Broad Street pump than any other)
Largely at Snow’s behest, the pump’s handle was removed, and the epidemic subsided, but does this tell us much? Outbreaks tend to subside!
Southwark and Vauxhall water company supplied 40,000+ homes from a reservoir that drew directly from the Thames
Supply had a well-established reputation for being…gross.
John Edwards “Sovereign of scented streams”
Lambeth waterworks, while it also drew from the Thames, moved their reservoir far upstream of the city in 1852.
| Water supply | Cholera deaths, 1849, rate per 100,000 | Cholera deaths, 1854, rate per 100,000 |
|---|---|---|
| Southwark & Vauxhall Company only | 1349 | 1466 |
| Lambeth Company Only | 847 | 193 |
Note that the companies have different starting points (Lambeth was already cleaner even by 1849), but miasma theory might lead you to expect the same trend.
If we can assume a parallel trend, then the relationship should look like this. The effect size, then, would be the difference between the counterfactual case and the observed case.
| Water supply | Cholera deaths, 1849, rate per 100,000 | Cholera deaths, 1854, rate per 100,000 | Difference in rates comparing 1854 to 1849, rate per 100,000 |
|---|---|---|---|
| Southwark & Vauxhall Company only | 1349 | 1466 | 118 |
| Lambeth Company Only | 847 | 193 | −653 |
| Difference-in-difference, Lambeth versus Southwark & Vauxhall | 502 | 1273 | −771 |
Answers the question “what would have happened to the treated units if they had not received the treatment” (average treatment effect on the treated or ATT)
i.e. “if Lambeth had not moved the reservoir upstream, there would have been a parallel increase in the number of cholera deaths among their customers”
or “But for [the treatment] the trends between treated and control units should be parallel”
Does not require an assumption that observations are balanced on expected values of the outcome. Unobserved confounding only matters to the extent it impacts the trend.
Parallel trends: lines would be parallel but for the treatment
Exogeneity of treatment with respect to expected trends: treatment isn’t a response to baseline outcome or expected outcomes.
No spillover: untreated units aren’t impacted by treatment.
Stable groups: the before/after populations for each group are the same
For a simple 2-group x 2-time period DiD model, we can get this entire thing from a fairly simple OLS model:
\[ \hat{Y} = B_0 + B_1 \text{Time} + B_2\text{Treated} + B_3\text{Time x Treated} \]
\(B_0\) The average for the control group at \(T=0\)
\(B_1\) The average for the control group at \(T=1\)
\(B_2\) The difference between the treated and control units at \(T=0\)
\(B_3\) The difference in slopes for the treated group compared to the control group \(T=1\)
library(tidyverse)
df<-data.frame(
"period" =factor(rep(c(0, 1), 2), labels=c("before", "after")),
"group" = factor(rep(c(0, 1), each=2), labels=c("control", "treatment")),
"deaths" = c(1349, 1466, 847, 193)
)
model<-lm(deaths ~ period * group , data=df)
tidy(model)|>
select(term, estimate)| term | estimate |
|---|---|
| (Intercept) | 1.35e+03 |
| periodafter | 117 |
| grouptreatment | -502 |
| periodafter:grouptreatment | -771 |
In this setup, the interaction term represents our difference-in-difference estimate
| term | estimate |
|---|---|
| (Intercept) | 1.35e+03 |
| periodafter | 117 |
| grouptreatment | -502 |
| periodafter:grouptreatment | -771 |
| Water supply | Deaths 1849 | Deaths 1854 | 1854 - 1849 |
|---|---|---|---|
| S & V | 1349 | 1466 | 118 |
| Lambeth | 847 | 193 | −653 |
| DiD | 502 | 1273 | −771 |
Do minimum wage increases reduce employment rates?
Results (reproduced by Angrist and Krueger)
How might the parallel trends assumption be violated here? Some scenarios to consider:
If employers in New Jersey laid off employees in anticipation of the wage increase, then the job losses might have already happened by February
If stores that had laid offs failed to response to the second wave of the survey, then the “after” period has a different composition compared to the first.
If Pennsylvania employers also raised wages in response to the New Jersey law, then the two trends aren’t really independent.
If New Jersey was more insulated from the national economic trends than Pennsylvania, then the the parallel trends assumption might not hold.
Card and Krueger address some of these with alternate model specifications. Since the DiD model is essentially OLS, they can include controls for wave 1 characteristics the same way you would in a regular regression model.
The assumption then becomes “trends are conditionally expected to be parallel”
What about multiple cases or time periods? Or cases where observations are treated at different times?
For instance, what if I want to look at multiple states that passed minimum wage laws on different dates?
The difference-in-difference model is often generalized* to multiple groups/multiple periods by using a fixed effect for each group/time in place of the indicator for control vs. treatment cases:
\[ \hat{Y}_{gt} = \alpha_g + \gamma_t + \delta X_{gt} \] \[ \alpha_g = \text{Group Fixed Effect} \]
\[ \gamma_t = \text{Time Fixed Effect} \] \[ \delta_{gt} = \text{Post Treatment Indicator} \]
* =prepare for caveats on this!
This isn’t really equivalent to the difference-in-differences model if there are multiple time periods and or differences in treatment timings See: Imai, K., & Kim, I. S. (2021) and won’t estimate a causal effect even if the parallel trends assumption holds in most situations
But we have some alternative methods!
Is the parallel trends assumption plausible? The best research will attempt to justify this assumption using multiple lines of evidence like:
Placebo tests
Examining multiple time periods to see if trends are parallel long before some outcome.
2 way fixed effects models might not be equivalent to the difference-in-differences method, so more recent analyses should account for this.